Feat/binning transformation by raivo-otus · Pull Request #801 · microbiome/mia

raivo-otus · 2026-01-15T14:04:32Z

Adds a quantile based binning transformation to the mia::transformAssay() -function, as discussed in #800
Default value of bin = 4, reflects roughly division to "rare, low, medium, high" which is easy to understand.
Unit tests include checks for both sample- and feature-wise transforms.

Should the transformation default to using "relabundance" assay, or leave choice to user discretion?

Pending tasks:

Add documentation
Add information on binning transformation to OMA

Potential optimizations;

Adding a parallel version, with e.g. doFuture, can be beneficial for large datasets.
C++ implementation called with Rcpp would be even better, both for serial and parallel performance.

TuomasBorman

Looks good, couple comments

R/transformCounts.R

TuomasBorman · 2026-01-21T10:39:36Z

Should the transformation default to using "relabundance" assay, or leave choice to user discretion?

As the binning is based on ranks, shouldn't counts and relabundance lead to same result?

TuomasBorman · 2026-01-21T10:40:20Z

Also discussed over lunch; you could update OMA's ML chapter if binning improves the accuracy

raivo-otus · 2026-01-21T11:42:53Z

Should the transformation default to using "relabundance" assay, or leave choice to user discretion?

As the binning is based on ranks, shouldn't counts and relabundance lead to same result?

My concern is with using e.g. CLR transformed values for the binning, which causes unexpected binning. In that sense it feels like a 'safetynet' to add some sort of check to use relabundance or counts.

TuomasBorman · 2026-01-21T11:50:05Z

Ahh, yes. Maybe you could check that the values are positive and give error if not as the result does not make any sense.

this renaming alings with other functions in the package

antagomir · 2026-01-26T13:06:30Z

The standard binning in R is done with function "cut". This is widely used and has many useful arguments. Just thinking whether that should be supported, or in general should multiple binning options be supported and their difference explained in the documentation.

But that can be another PR.

R/transformCounts.R

raivo-otus · 2026-01-26T13:25:07Z

The standard binning in R is done with function "cut". This is widely used and has many useful arguments. Just thinking whether that should be supported, or in general should multiple binning options be supported and their difference explained in the documentation.

But that can be another PR.

I'll look into using cut. It should be possible to implement quantile based binning with cut aswell. It most likely is faster to use built-in functions where possible.

I think testing different binning methods would be beneficial. The quantile binning approach is supported by the BiomeGPT paper, but there are of course other ways like simple equal width bins etc. Seems logical to include in a separate PR to expand functionality if it seems useful to include other binning options.

TuomasBorman · 2026-02-09T12:56:44Z

@raivo-otus any updates?

raivo-otus · 2026-02-10T14:24:21Z

Added lit ref to the BiomeGPT paper describing the binning strategy, and changed the implementation to utilize cut().

raivo-otus and others added 3 commits January 15, 2026 15:13

Add implementation of binning transformation

6a3d3ac

Add unit tests for binning transformation

b7b03dd

Merge branch 'devel' into feat/binning_transformation

d98d4e4

raivo-otus marked this pull request as draft January 15, 2026 14:05

TuomasBorman reviewed Jan 21, 2026

View reviewed changes

R/transformCounts.R Outdated Show resolved Hide resolved

R/transformCounts.R Outdated Show resolved Hide resolved

raivo-otus added 6 commits January 21, 2026 13:59

rename argument bins to nbins

a586a75

this renaming alings with other functions in the package

missed rename of bins -> nbins

0e0a48d

rename bins to nbins in unit test aswell

4691df1

Add unittest for binning negative values

6ed957c

add mention of option to function documentation

9a4f864

warning to error on attempt to bin negative values

3d7cef6

raivo-otus marked this pull request as ready for review January 22, 2026 10:05

fix test

3cd1f40

antagomir reviewed Jan 26, 2026

View reviewed changes

R/transformCounts.R Show resolved Hide resolved

raivo-otus added 3 commits February 10, 2026 15:56

add lit ref

1d4c092

implementation of logic with cut

7597721

Merge branch 'devel' into feat/binning_transformation

86091db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/binning transformation#801

Feat/binning transformation#801
raivo-otus wants to merge 13 commits intomicrobiome:develfrom
raivo-otus:feat/binning_transformation

raivo-otus commented Jan 15, 2026 •

edited

Loading

Uh oh!

TuomasBorman left a comment

Uh oh!

Uh oh!

Uh oh!

TuomasBorman commented Jan 21, 2026

Uh oh!

TuomasBorman commented Jan 21, 2026

Uh oh!

raivo-otus commented Jan 21, 2026

Uh oh!

TuomasBorman commented Jan 21, 2026

Uh oh!

antagomir commented Jan 26, 2026

Uh oh!

Uh oh!

raivo-otus commented Jan 26, 2026

Uh oh!

TuomasBorman commented Feb 9, 2026

Uh oh!

raivo-otus commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

raivo-otus commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TuomasBorman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TuomasBorman commented Jan 21, 2026

Uh oh!

TuomasBorman commented Jan 21, 2026

Uh oh!

raivo-otus commented Jan 21, 2026

Uh oh!

TuomasBorman commented Jan 21, 2026

Uh oh!

antagomir commented Jan 26, 2026

Uh oh!

Uh oh!

raivo-otus commented Jan 26, 2026

Uh oh!

TuomasBorman commented Feb 9, 2026

Uh oh!

raivo-otus commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

raivo-otus commented Jan 15, 2026 •

edited

Loading